In [ ]:
# Convention for import to get shortened namespace
import numpy as np
In [ ]:
# Create an array for testing
a = np.arange(12).reshape(3, 4)
In [ ]:
a
Indexing in Python is 0-based, so the command below looks for the 2nd item along the first dimension (row) and the 3rd along the second dimension (column).
In [ ]:
a[1, 2]
Can also just index on one dimension
In [ ]:
a[2]
Negative indices are also allowed, which permit indexing relative to the end of the array.
In [ ]:
a[0, -1]
Slicing syntax is written as start:stop[:step]
, where all numbers are optional.
It should be noted that end represents one past the last item; one can also think of it as a half open interval: [start, end)
In [ ]:
# Get the 2nd and 3rd rows
a[1:3]
In [ ]:
# All rows and 3rd column
a[:, 2]
In [ ]:
# ... can be used to replace one or more full slices
a[..., 2]
In [ ]:
# Slice every other row
a[::2]
In [ ]:
# You can also slice using negative indices
a[:, :-1]
In [ ]:
data = [1, 3, 5, 7, 9, 11]
out = []
# Look carefully at the loop. Think carefully about the sequence of values
# that data[i] takes--is there some way to get those values as a numpy slice?
# What about for data[i + 1]?
for i in range(len(data) - 1):
out.append((data[i] + data[i + 1]) / 2)
print(out)
data = np.array([1, 3, 5, 7, 9, 11])
out = (data[:-1] + data[1:]) / 2
print(out)
data = np.array([1, 3, 5, 7, 9, 11])
out = (data[2:] + data[1:-1] + data[:-2]) / 3
print(out)
In [ ]:
data = np.arange(12).reshape(3, 4)
# total = ?
print(data[0] + data[1] + data[2])
\# Or we can use numpy's sum and use the "axis" argument
print(np.sum(data, axis=0))
The solution to the last exercise introduces an important concept when working with NumPy: the axis. This indicates the particular dimension along which a function should operate (provided the function does something taking multiple values and converts to a single value).
Let's look at a concrete example with sum
:
In [ ]:
a
In [ ]:
# This calculates the total of all values in the array
np.sum(a)
In [ ]:
# Keep this in mind:
a.shape
In [ ]:
# Instead, take the sum across the rows:
np.sum(a, axis=0)
In [ ]:
# Or do the same and take the some across columns:
np.sum(a, axis=1)
In [ ]:
# Synthetic data
temp = np.random.randn(100, 50)
u = np.random.randn(100, 50)
v = np.random.randn(100, 50)
# Calculate the gradient components
gradx, grady = np.gradient(temp)
# Turn into an array of vectors:
# axis 0 is x position
# axis 1 is y position
# axis 2 is the vector components
grad_vec = np.dstack([gradx, grady])
print(grad_vec.shape)
# Turn wind components into vector
wind_vec = np.dstack([u, v])
# Calculate advection, the dot product of wind and the negative of gradient
# DON'T USE NUMPY.DOT (doesn't work). Multiply and add.
advec = (wind_vec * -grad_vec).sum(axis=-1)
print(advec.shape)
In [ ]:
# Create some synthetic data representing temperature and wind speed data
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
50 + 2 * np.random.randn(100))
spd = (np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) +
10 + 5 * np.random.randn(100)))
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(temp, 'tab:red')
plt.plot(spd, 'tab:blue');
By doing a comparision between a NumPy array and a value, we get an array of values representing the results of the comparison between each element and the value
In [ ]:
temp > 45
We can take the resulting array and use this to index into the NumPy array and retrieve the values where the result was true
In [ ]:
print(temp[temp > 45])
So long as the size of the boolean array matches the data, the boolean array can come from anywhere
In [ ]:
print(temp[spd > 10])
In [ ]:
# Make a copy so we don't modify the original data
temp2 = temp.copy()
# Replace all places where spd is <10 with NaN (not a number) so matplotlib skips it
temp2[spd < 10] = np.nan
plt.plot(temp2, 'tab:red')
Can also combine multiple boolean arrays using the syntax for bitwise operations. MUST HAVE PARENTHESES due to operator precedence.
In [ ]:
print(temp[(temp < 45) & (spd > 10)])
In [ ]:
# Here's the "data"
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
80 + 2 * np.random.randn(100))
rh = (np.abs(20 * np.cos(np.linspace(0, 4 * np.pi, 100)) +
50 + 5 * np.random.randn(100)))
# Create a mask for the two conditions described above
# good_heat_index =
# Use this mask to grab the temperature and relative humidity values that together
# will give good heat index values
# temp[] ?
# BONUS POINTS: Plot only the data where heat index is defined by
# inverting the mask (using `~mask`) and setting invalid values to np.nan
import numpy as np
\# Here's the "data"
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
80 + 2 * np.random.randn(100))
rh = (np.abs(20 * np.cos(np.linspace(0, 4 * np.pi, 100)) +
50 + 5 * np.random.randn(100)))
\# Create a mask for the two conditions described above
good_heat_index = (temp >= 80) & (rh >= 0.4)
\# Use this mask to grab the temperature and relative humidity values that together
\# will give good heat index values
print(temp[good_heat_index])
\# BONUS POINTS: Plot only the data where heat index is defined by
\# inverting the mask (using `~mask`) and setting invalid values to np.nan
temp[~good_heat_index] = np.nan
plt.plot(temp, 'tab:red')
In [ ]:
print(temp[0])
We can also extract the first, fifth, and tenth elements:
In [ ]:
print(temp[[0, 4, 9]])
One of the ways this comes into play is trying to sort numpy arrays using argsort
. This function returns the indices of the array that give the items in sorted order. So for our temp "data":
In [ ]:
inds = np.argsort(temp)
print(inds)
We can use this array of indices to pass into temp to get it in sorted order:
In [ ]:
print(temp[inds])
Or we can slice inds
to only give the 10 highest temperatures:
In [ ]:
ten_highest = inds[-10:]
print(temp[ten_highest])
There are other numpy arg functions that return indices for operating:
In [ ]:
np.*arg*?